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Abstract 

Following recent suggestions in the literature, we interpret point forecasts as func¬ 
tionals (i.e., point summaries) of predictive distributions. We consider the situation of 
unknown directives, and estimate the functional, which might vary over time as a func¬ 
tion of certain state variables, using the generalized method of moments. Focusing on 
two classes of state-dependent functionals, quantiles and expectiles, we show that our 
estimators are identifiable, consistent, and asymptotically normal. We construct spec¬ 
ification and rationality tests for forecasts, and propose a novel approach to combine 
point forecasts. In a data example, we show that the gross domestic product (GDP) 
Greenbook forecasts of the U.S. Federal Reserve can be interpreted as an evolving 
quantile that depends on the current growth rate. Based on these findings, we con¬ 
struct an improved GDP mean forecast. Using simulated data, we demonstrate that 
our rationality test is better calibrated and more powerful than existing approaches. 
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1 Introduction 


Forecasts are often the basis of crucial decisions. Yet, they are fraught with uncertainty due to 
imperfect observation, understanding, and modeling of the underlying mechanisms. To fully 
account for this uncertainty, it is increasingly recognized that forecasts should be probabilistic 
in nature. If forecasts are issued in the form of predictive distributions, it is straightforward 
to compute the action that maximizes the expected utility, test for optimality, or compare 
them to other forecasts (see Gneiting and Katzfuss, 2014, for a recent review of probabilistic 


forecasting). 

However, point forecasts are still ubiquitous. They can be treated under the proba¬ 
bilistic framework by interpreting them as a functional (i.e., a point-valued summary of a 
distribution) of the density forecasts, such as the mean or median. In an economic context. 


Elliott and Timmermann (2008) and Engelberg et al. (2009) emphasize the importance of 


integrating forecasts into a decision-theoretical framework. A functional can be expressed in 
terms of the risk (i.e., the expected loss) that the functional minimizes. For example, if the 
directive consists of a quadratic loss function, the optimal point forecast is the mean of the 
predictive distribution. Knowing which functional was reported by the forecaster allows for 
proper interpretation, evaluation, testing, and comparison of point forecasts. 

We consider point forecasts with an unknown directive, for which the forecaster only im¬ 
plicitly reported a certain functional of the distribution representing the forecast uncertainty, 
and no loss function was explicitly specihed or communicated. This situation arises, for ex¬ 
ample, with expert forecasts or vague questions in surveys. Another important example for 
forecasts with unknown directives is output from complex computer models, which are often 
tuned by multiple individuals to achieve forecasts that the individuals perceive as optimal in 
a way that might be neither transparent nor clearly dehned. Such forecasts would be most 
informative if the user knew the directive under which the forecast was issued. Our goal here 
is to estimate the functional or directive from a time series of point forecasts and associated 
realizations. Once the directive has been estimated, the point forecasts can be coherently 
interpreted, improved, tested, or compared to other point or probability forecasts. 

Past work on estimating a directive based on point forecasts and realizations has focused 


on the estimation of the loss function. 1 

jieli and Stinchcombe 

(2013 

) discuss the recoverability 

of the loss function in the binary case. 

Elliott et al. 

(2005 

) provide a generalized method of 


moments (GMM) estimator of the loss function for constant preferences and linear forecasting 
models. Patton and Timmermann (2007b) apply this method to the U.S. Federal Reserve’s 
GDP forecasts with a new class of loss functions, which consists of quadratic splines flexible 
with respect to a state variable. Recently, piecewise linear and quadratic loss functions 


have been used in various economic applications ( 

Gaunedo et al. 

2013 

Ghristodoulakis 

and Mamatzakis, 2008 Elliott et al., |2008 ; 

Fritsche et al.l |2015 

Krol 

2013 

Pierdzioch 

et al. 

2013 

). In a neuroscience application, iKording and Wolpert 

2004 

) estimate the loss 


function implicit in human sensorimotor control by varying targets in an experimental task. 


Sims (2015) uses a similar approach to infer the implicit loss function of the visual working 


memory. 

We demonstrate that the loss function is, in fact, not identihable. Hence, instead of 
estimating the loss function itself, we propose to estimate the functional. By restricting the 
class of feasible functionals adequately, we ensure that our estimator is identihed. As such. 
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we generalize the findings of Elliott et al. (2005) to forecasts that are not required to be 
linear functions of the instrumental variables, and we include state-dependent forecasting 
directives as first proposed in Patton and Timmermann (2007b). While our paper focuses 
on the time series setting, the considerations apply in other contexts as well. 


Estimating a functional is a more flexible approach than that of Elliott et al. (2005), 
and allows for the construction of a wealth of specification tests. The results are easy to 
interpret, and can be communicated to decision makers with little or no statistical training. 
Further, the estimator induces better calibrated and more powerful rationality tests, as 
compared to the approach of Patton and Timmermann (2007b). The new approach allows 
for the interpretation of systematically biased forecasts as state-dependent quantiles, for 
performance comparisons between point and probability forecasts, for backtesting value-at- 
risk forecasts, and for the construction of density forecasts from multiple point forecasts. In 
particular, we show that the GDP Greenbook forecast of the U.S. Federal Reserve can be 
interpreted as a state-dependent quantile, and we construct an improved mean forecast. 

This manuscript is organized as follows. Section introduces the setting and discusses 
the relevance and identification of state-dependent functionals. In Section]^ we describe the 
methodology, and prove consistency and asymptotic normality of our estimators. Section 
connects the new findings to related topics, including rationality tests and the combination 
of multiple forecasts. In Section we compare our methodology to the approach of Patton 


and Timmermann (2007b), using both simulated data and the GDP Greenbook forecasts. 


Section 1^ serves as a discussion. Proofs are provided in the Appendix. 


2 Notation, background, and motivation 

Gonsider a stochastic process {(h*, Zt) : t = 1,2,..A forecaster attempts to predict the 
random variable 17 based on information contained in the cx-algebra IFt, which encodes the 
information available to the forecaster when issuing the forecast. In the situation of an h- 
step ahead forecast, the available information is typically generated by the lagged variables 
of the outcome Y and the vector-valued state variable Z, in which case 


J't — <^(^1,..., Zi ,..., Zt-h), t — h + l,h + 2,.... 


We denote the conditional predictive distribution by = £(17|J7), but the forecaster only 
reports a point forecast W = which is a. functional (i.e., a real-valued summary) 

of the predictive distribution £( 17 |J 7 )- 

Now, crucially, we assume that we do not know which functional was used by the fore¬ 
caster nor what the predictive distributions Pj were. In line with seminal work on professional 
economic forecasters (Elliott et al., 2005 Patton and Timmermann, 2007b|), we merely as¬ 


sume that the underlying predictive distributions constitute the conditional distributions 
based on some information set available to the forecaster when issuing the forecast for 17 - 
Throughout this manuscript, random variables are written in upper case letters and their 
realizations in lower case letters. We use the short notation a(Yt\iFt) for a(£(I7|J7)), as is 
costumary for the mean functional, where E[17|J7] stands for ¥.Yr^c(Yt\Tt)\X]- We assume 
single-valued functionals; under additional technical considerations the results extend to the 
set-valued case. Statements about all time points are often denoted without subscripts. For 
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example, we write X = a{Y\X) instead of Xf = a{Yt\Xt) for all t. If i? is a random variable, 
R E J- means that R is J^-measnrable. The partial derivative of a fnnction g{x,y) with 
respect to x is denoted as g(^x){x,y). 

We allow the fnnctional a to depend on the cnrrent situation, represented by the R- 
measurable state variable We call this a state-dependent functional with state variable 2 ; 
and write az{Y\R). 


2.1 Probabilistic forecasts, functionals, and loss functions 


Uncertainty is commonly represented by probability distributions. If forecasters are able to 
issue the full probability distribution for the events they predict, this would be most infor¬ 


mative for any forecast user, regardless of the utility function at hand (Diebold and Lopez 


1996a). Examples of probabilistic reports include the Survey of Professional Forecasters, 


which elicits predictive densities of inflation and GDP growth from macroeconomic forecast¬ 
ers. Lately, some household surveys elicit probabilistic predictions of various future events 


(Manski, 2004). 


However, unless restricted by a parametric family, the predictive distribution P is an 
infinitely dimensional object, which cannot be fully described by a single point forecast. 
Consequently, the recipient of a point forecast observes only an ambiguous summary of the 
predictive distribution. If forecasts are meant to inform different forecast users, or a single 
forecast user with unknown or time-varying preferences, a point prediction contains less 
information than a probabilistic prediction. 

The interpretation of a point forecast requires additional assumptions on the decision 


process that forecasters use to generate their predictions (Manski, 2016). The most common. 


but not always appropriate, approach is to assume that the point forecast represents the mean 
of the conditional distribution given an information set he., 

Xt = ¥.[Yt\R]. 

A functional a is elicitable with respect to a class of probability distributions V if there 
exists a loss function L{x, y) such that the risk-minimizing point forecast equals the functional 
for every distribution in the class, i.e.. 


a(P) = argminEp[L(a;, U)] 

xSK 


for all ¥ eV. 


Any such loss function is called consistent for the functional a. Thus, every loss function 
induces an elicitable functional. For example, the quadratic loss L{x,y) = {x — yY is 
consistent for the mean functional with respect to the class of probability distributions with 
finite first moments. In fact, for most commonly used functionals, including any quantile 


and expectile, there exist many consistent loss functions (Ehm et ah, 2016 Gneiting, 2011). 


2.2 Asymmetric state-dependent functionals 

We now discuss why the commonly postulated mean functional is not sufficient in many 
situations, and why there is a need for asymmetri^ and state-dependent functionals. 

^We define a functional a as symmetric, if for every symmetric distribution P with symmetry point c, it 
holds that c = a(P). For example, the mean and the median are symmetric functionals. 
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There is a vast literature providing evidence for asymmetric loss functions (Christoffersen 


and Diebold 

1997 

et al. 

(2016) 

attach 


Christoffersen et ah, 1996 Skouras, 2007). In a recent example, Ehm 


et al. (2016) attach economic meaning to the classes of asymmetric functionals given by 


quantiles and expectiles: An agent facing the decision to invest in a certain project can 
identify the proht-maximizing strategy with knowledge of only the quantile or expectile of 
the proht distribution. The asymmetry arises, for example, in the context of tax reduction 
connected to losses. If the decision involves no such asymmetries, the project is prohtable if 
and only if the expected proht exceeds the necessary capital. If the level of tax reduction in 
the setting of Ehm et al. (2016) depends on prohts, the optimal decision is a function of a 
state-dependent quantile or expectile of the proht distribution. 


Patton and Timmermann (2007b) hnd that the Greenbook GDP forecasts are rational 


with respect to an asymmetric loss function, where the level of asymmetry depends on the 
current growth level (e.g., recessions are associated with overly conservative forecasts). In the 
household consumption setting, Andersen et al. (2008) hnd state-dependent risk preferences 
with respect to personal hnances. 

Another source for asymmetric and state-dependent functionals is asymmetric informa¬ 
tion. Even under a symmetric mean forecast, asymmetric information may lead to asym¬ 
metric and state-dependent functionals relative to the information set of the forecast user. 
See Appendix for an illustrative example. 


2.3 Identification of unknown functionals 


Steinwart et al. (2014) show that (modulo regularity conditions) every continuous and elic- 


itable functional a has an identification function V (x, y) such that 


X = Q!(P) Ep[E(x, Y)] = 0, 

where V equals almost surely the partial derivative (x, y) of any loss function L consistent 
for We use this result to construct moment conditions for any elicitable functional. The 
respective Regularity Gonditions 1, which are detailed in Appendix]^ apply to the special 
case of the mean functional, provided the conditional distribution C{Y\J^) has hnite hrst 
moment. 


Lemma 1 (Identihcation). Assume 


X = a{Y\X), 

where T is some a-algebra and a : V ^ M. is an elicitable functional. Under Regularity Con¬ 
ditions 1, there exists an identification function V satisfying the following moment condition 
for any X-measurable random vector W: 


E[R(X,y) ■W] = 0. 


( 1 ) 


Furthermore, for all y the identification function V{-,y) eguals Lebesgue almost everywhere 
the partial derivative L(a,)(•,?/) of any loss function L that is consistent for a up to an T- 
measurable scaling factor. 


^For details see Theorem 9 in 


Steinwart et al. 


(2014). 
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The regularity conditions and the proof can be found in Appendix The intuition is 
straightforward. The functional a is elicitable, thus there exists a loss function L such that 


X = argminE[L(a;, F)|J^]. 

X 

Neglecting regularity conditions, the hrst-order condition is 


E[L(,)(X,F)|J-] =0, 


which in turn implies E[L( 3 ,)(X, F) ■ W] = 0 for all W ^ T. By Lemma we may identify 
any elicitable functional with the partial derivatives of its consistent loss functions. There is 
no need to distinguish between loss functions that are consistent for the same functional, as 
every consistent loss function leads to the same identihcation function modulo a multiplica¬ 
tive J^-measurable constant. It is impossible to distinguish between different loss functions 
consistent for the same functional. 

We call the components of the random vector W instruments, which can be chosen from 


the information available to the forecaster when the prediction was issued. See Section |4.1 
for further discussion. 


2.4 State-dependent quantiles and expectiles 

Lemma illustrates that the specihc loss function connected to the functional is not iden- 
tihed. In contrast, the functional is uniquely identihed given sufficient heterogeneity in the 
predictive distributions]^ We focus on two classes of functionals, quantiles and expectiles, 
for which we can establish unique identihcation for a broad class of distributions. 

The r-quantile (3'r(IP) of a distribution P with continuous and strictly increasing cumula¬ 
tive distribution function is the unique solution x to P((—oo, a:]) = r. We can express this di¬ 
rectly in terms of the identihcation function of the r-quantile, namely V{x, y) = l{y < x) —r: 

X = Qri^) Ep[l(F < x) — r] = 0. 


While quantiles are asymmetric generalizations of the median, expectiles are analogously 
dehned as asymmetric generalizations of the mean. Specihcally, the r-expectile eT-(P) of the 
distribution P with hnite mean was introduced in Newey and Powell ( |1987 ) as the unique 
solution to the following equation: 


X = er(P) 


T 

1 — r 


/-oo(a^ - y)dHy) 
J^{y - x)dF{y) ■ 


This is equivalent to 

Ep[|l(y <x)-r|(x-y)] = 0, 

which reveals the corresponding identihcation function, namely V{x,y) = |l(x >y)—T\{x — 

y)- 

^Limited classes of distributions do not necessarily identify the functional. Consider the mean and the 
median as example. While being different functionals, they are identical on the set of symmetric distributions 
and therefore cannot be uniquely identified on such a limited set of distributions. 
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If the second moment of P is finite, e^(P) equals the risk minimizing action under the 
quad-quad loss function Lr{x,y) = |l(x > y) — t\{x — yY- As argued in Lemma the 
identihcation function equals the derivative of the consistent loss function almost surely, up 
to a scaling factor. 

Let z denote the state variable introduced at the beginning of Section We allow for 
additional flexibility and let the level of the quantile or expectile depend on the state variable 
2 ; via a parametric function m{z, 9) mapping into the unit interval (0,1) for every 0 G 0, 
and we call this function the specification model. Specihcally, the class of state-dependent 
quantiles is given by 

Tq{m) := {qm(z,e)(P) | 6 * G 0 }. 

Analogously, we dehne the class of state-dependent expectiles as 

Te{m) := {Craizfifi^) | 6 ^ G 0 }, 


where the level m is again a function of the state variable z and the parameter 9. Note that 
this includes the special case m(;s, 9) = 9^ which assumes that the forecaster always states 


the same functional, as in previous work (( 

Dhristodoulakis and Mamatzakis 

2(I(IN 

Elliott 

et al. 

2005 

Fritsche et al. 

2015; 

Krol 

2013 

Pierdzioch et al. 

2013 

)• 


3 Estimation 

In this section, we use the identihcation property of Lemma to construct a uniquely iden- 
tihed GMM estimator in the stochastic process setting. Given a sample {xt,yt, Zt)t=i,2,...,T of 
forecasts, observations and state variables, our goal is to infer the functional that the point 
forecasts Xi, ...,xt represent. Our approach can be summarized as follows: 

1. Dehne a specification model m and the respective class T of state-dependent quantiles 
or expectiles. We suppose that the class contains the true reported functional am{z,eo)- 

2. For each functional am(zfi) ^ 7”, use the identihcation function and some in¬ 
struments w to generate the moment function g{x,y;z,9) = (x, |/)tc with the 

property 

EMX, Y-,Z,9)] = 0^X = «^(Z,,)(P). (2) 

3. Find 9, the consistent GMM estimator of 9q, and its asymptotic distribution, which 

induce the estimator of the functional 0 ^( 2 ,eo) • 

3.1 Generalized method of moments estimation of state-dependent 
functionals 

Given a class T of state-dependent functionals with corresponding identihcation functions 
V and a vector of J^-measurable instrumental variables w = {wi ,..., Wq), let 

Vt = {{xt,yt,zt,wi^t, ■ ■■WqY : t = 1,... ,T} 
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denote the set of all data available at time T. We obtain the GMM estimator of 6 as 


9t := argmin||5(r(6', Vt)||, 

0G© 


( 3 ) 


i.e., as the value that minimizes some norm of the empirical mean of the moment function 

9, 

1 T ^ T 

grid, Vt) = - ^ g{xt, yt; zt,e) = - '^V{xt, yt, zt, 9) ■ Wf 


t=i 


t=i 


I in ([^ are possible, but here we will only consider the Ma- 
a:, where the weighting matrix is a consistent estimator 
of the inverse of the covariance matrix of the moment function g based on the heteroskedas- 


Several choices for the norm 
halanobis norm 


x||g-i := x'Srp 


ticity and auto-correlation consistent (HAG) covariance estimator proposed by Newey and 


West (1987, p. 703-708): 


T-l 

ST = m= Y. kh{t)Tt{9), 

where kh{s) is a kernel with bandwidth h, and rt(0) = ^ ^ ^ ^i+t)■ 

To hnd 0T in (|^, we apply the two-step GMM procedure proposed in 


Hansen 


(1982): 


1. Gompute 9 = axgminQ^Q gT{9ygT{9). 

2. Gompute the HAG matrix S{9). 

3. Gompute 9* = aigmino^Q gT{9yS{9)~^gT{9). 


3.2 Consistency 


If we reduce the eligible functionals to either the class of state-dependent quantiles Tq{m) 
or expectiles Te{m) (i.e., we assume there exists a G 0 such that the true forecasting 


directive is the m{z, 6'o)-quantile or -expectile), the GMM estimator described in Section 3.1 
is consistent. 


Theorem 1 (Gonsistency). Let m{z,9) be a mapping to the interval (0,1). Let the time se¬ 
ries {xt,yt)t=i, 2 ,... consist of realizations and point forecasts derived as state-dependent quan¬ 
tiles of a conditional distribution: 


^ gm{z,do)0^\^') ^ 

where 9o E Q is the true parameter. Under Regularity Conditions and if the alternative 
models differ from the true model on a set with positive weight, 

P(m(Z, ^o) 7 ^ m(Z, 9)) > 0 for all 9 E & with 9 7 ^ ^o, 

there exists an IFt-xaeasurable instrumental variable Wt such that the GMM estimator, 

9 t := argmin||5(r(6', Vt)||, 










is a consistent estimator of 9 q. 

Analogously, the estimator is consistent if the point forecast is a state-dependent expectile 
of a conditional distribution: 

X = em(z,eo)(X\X), 

where 9o E Q is the true parameter. 


The regularity conditions and the proof are given in Appendix The choice of instru¬ 
mental variables w is discussed in Section 4T, and some examples of specihcation models m 
are introduced in Section! 


3.3 Asymptotic normality 

Once the identihcation of the system is established, the GMM theory provides a range of 
useful asymptotic results. 

Theorem 2 (Asymptotic normality). Under Regularity Conditions [^and[^ the GMM esti¬ 
mator is asymptotically normally distributed, 

Vf{9T - 9o) A A/;(0, asT^oo, 

where G := Z,9q)\ is the expectation of the partial derivative of the moment 

function and S := 'Va.T[g{X,Y-, Z,9o)] is its variance. 

The regularity conditions and the proof are given in Appendix While analyzing any 
specihc model, we can plug in the model function m and obtain G for the quantile class as 

G = E[m^e){Z,9o)-W], 

and analogously for expectiles, we have 

G = E[m^e){Z,9o)-{X-Y)-W]. 


4 General applications of state-dependent functionals 

Once a point forecast has been convincingly connected to a functional, classical results from 
the literature on forecast evaluation based on mean forecasts extend readily to the general 
framework. In this section, we illustrate some possible applications. 


4.1 Tests of forecast rationality 

The literature on the evaluation of point forecasts typically dehnes optimality or rationality 


of a point forecast in terms of minimizing some expected loss (Elliott et ah, 2005 Patton and 


Timmermann, 2007b). As shown in the Appendix]^ this concept of rationality corresponds 


to the risk-minimizing rule over a conditional predictive distribution. This perspective allows 
a more profound analysis and interpretation: A point forecast can violate the rationality 
criteria either because the subjective loss of the forecaster is not consistent with the assumed 
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forecasting directives, or because the subjective perception of uncertainty is inconsistent (i.e., 
based on the available information the forecaster perceives a probability distribution different 
from the actual conditional distribution). 


The so-called test of overidentifying restrictions (Hansen 
analyze rationality of a forecast. We dehne the J-statistic 


1982) is commonly used to 


Me) := TgT{dysT{e)-^gT{0) 


(4) 


Theorem 3. If the true reported functional is a state-dependent quantile or a state-dependent 
expectile, 


X = qm{z,eo)(X\J^) or X = em{z,eo)(X\J^)y 

and if Regularity Conditions^ and\^ hold, then the J-statistic is asymptotically chi-square 
distributed, 

Jt{0t) asT ^ oo, 

where the degrees of freedom are determined by the number of instruments, q, and the number 
of parameters, p. 


The regularity conditions and the proof are given in Appendix 

Note that the concept of rationality used in Patton and Timmermann (2007b) and Elliott 


et ah (2005) needs to be specihed in two aspects. A point forecast can only be dehned as 
rational with respect to a specihc functional and specihc information set. Compared to 
standard rationality tests, we allow for additional flexibility in terms of the functional. 

The choice of instruments w for the rationality tests determines the information set for 
which we test. If a forecast x is rational with respect to J^, it is also rational with respect 
to any smaller information set Q G J^, because we have 


X = a{YM ^ E[H„(X,F)|J-] =0 EM{X,Y)\g] = 0, 


where Va is the identihcation function of a. If a test with instruments w rejects rationality, 
the point forecast is not rational with respect to any information set IF that contains the 
information set a{w) generated by w: 

EM{X,Y)\a{w)]yl0 => X^a{YM. 

In summary, the J-statistic based on the class of functionals T and instruments w can be 
used to test the hypothesis 


Hq : There exists a functional a G T such that X = a{Y\IF) with a{w) C IF. 


We use this fact in Section 5^ to analyze size and power of rationality tests with respect to 
information rigidities. 

Note that we use the term rationality instead of optimality, because a rational yet un¬ 
informed point forecast can only be rejected if the appropriate instruments are available. 
Furthermore, a misspecihed or irrational forecast can still form a rational forecast with re¬ 
spect to a smaller information set. Hence the term optimality may be misleading. 
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The literature provides several rationality tests for mean forecasts: serially uncorrelated 


forecast errors (Diebold and Lopez 

1996b 

, decreasing mean-squared error in a multi-horizon 
), and explanatory content of the forecasts in a 

setting (Patton and Timmermann 

, 2012 

least-squares regression (Mincer and Zarnowitz, 1969). Notably, while none of these tests is 

well-behaved for other functionals ( 
extended to a more general setting. 

Patton and Timmermann, 2007a), they could be readily 

based on general loss functions and the generalized fore- 


cast error (Patton and Timmermann, 2007b, 2010). As a first step, one needs to determine 
the underlying functional of a forecast. In the absence of any forecasting directives or loss 
functions, our method provides a flexible approach for estimating the functional. 


Increasing the information set results in a lower expected loss (Holzmann and Eulert 


2014). Considering forecasts issued for the same point in time, but at different horizons, the 


information set of the forecaster should steadily increase, implying that the expected loss 
decreases. This can be extended to state-dependent functionals, assuming that the functional 
does not change with the horizorj^ 


4.2 Specification tests for forecasting behavior 

The estimation of models m{z, 9t) provides us with a whole set of tests. In this section, 
we propose some models and testable hypotheses. Any restriction R{0q) = 0 for the model 
m{z,9), where R : Q W is differentiable and i?( 0 )( 6 *o)^T( 6 'o)-R( 0 )( 6 *o)^ is non-degenerate, 
can be tested using a Wald statistic of the form 

Wt{9) := TR{9y{R^e){9)ST{9)R^0){9y)-^R{9), 

where R(0){9) = dR{9)/d9' is the derivative of R evaluated at 9. It holds that 

Wt{9t) a xl as T ^ cx). 

The model m{z, 9) = {9i + z ■ ^ 2 ), where the logistic function T(x) := (1 -|- exp(— 

ensures that the level of the quantile or expectile is in the unit interval for any 9 = {9i, ^ 2 ) ^ 
specihes the asymmetry of the issued functional with respect to some state variable z. 
Under the hypothesis that the forecasting behavior is not influenced by z, 

Ho:9oeA with A ;= {0 e | 02 = 0}, 

it holds that 

lUr(0r) Xu as T cx). 

We also might be interested in structural changes in the risk assessment at time R G 
{1,..., T}. The model 

m(t,9) := l{t > R)M{9i) + l(t < R)M{92) 

facilitates tests for the hypothesis that there has been no structural break in the forecasting 
behavior: 

Hq : 9o G A with A := {0 G 0 | 0i — 02 = 0}. 

^It is worth noting that the derived bounds hold only under constant forecast preferences across forecast 
horizons. 
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Lastly, we suggest a method to detect seasonality-based asymmetry in the forecasts. The 
according model is 

m{t, 6) := 6*1 -|- 6*2 sin(27r6/6*3), 

and it provides information about the constant term 6 *i, the magnitude 6 * 2 , and the period 
6*3 of the seasonal effect. 


4.3 Generating a density forecast from a set of point forecasts 


With the estimated functional, we can embed point forecasts in the theoretical setting of 
probability forecasts. Consider a set of point forecasts that we would like to combine into 
a single probabilistic or density forecast. A well-known approach in forecast combination is 
linear pooling. In many applications, the average of a set of point forecasts has been found 
to outperform even the best-performing individual forecast. This hnding is commonly de¬ 
noted as “wisdom of the crowds.” Aiolh and Timmermann ( 2006|[ ) provide evidence that the 
forecasting performance of individual forecasters is persistent to some degree and that com¬ 
bination strategies conditional on past performance surpass standard combination methods. 


Mannes et al. (2014) note that the best weighting between different forecasts depends 


on bracketing and dispersion of ability between the different forecasts. The average forecast 
systemically outperforms individual forecasts if forecast errors have different signs and similar 
variance. 

We argue instead that it is more informative to analyze the whole set of forecasts. Es¬ 
timating the functional behind a forecast provides a method to achieve calibration for each 
forecast individually. In a second step, we combine the set of point forecasts to a predictive 
density, where each forecast contributes according to its functional and its past performance. 

Specihcally, assume that for each point in time t, there are n point forecasts Xt = 
{xi^t) ■ ■ ■ )Xn,t) of yt- We begin by estimating the quantiles or expectiles quoted by each 
forecaster, m( 6 *i, z),..., m{9n, z), respectively, and we compute the p-values pi,... ,p„ of the 
tests of overidentifying restrictions in (|^ for each forecast. Letting J^t,i be the information 
available to forecaster i (and unknown to us), we estimate the conditional distribution of 
t,i) ■ ■ ■ t^n} as 


P = argmin 


2 = 1 




( 5 ) 


where w : [ 0 , 1 ] h-)■ ]R+ is a monotonically increasing weight function and qm{^) is the m- 
quantile of the distribution P. Expectile models can be combined analogously. 

The crucial parameters for this density forecast are the probability class V, the weight 
function w{-) and the distance function between the forecast and the functional dehned by 
the norm || ■ ||. We recommend choosing among appropriate options by minimizing the 


average continuous ranked probability score on past values (Gneiting et ah, 2007) 


While it is often assumed that discrepancies between multiple point forecasts are due to 


a high uncertainty in the underlying probability distribution (e.g. Lahiri and Sheng, 2010), 
it is clear from the discussion above that this is not necessarily the case. For example, 
whereas two point forecasts consisting of different quantiles of a relatively sharp probability 
distribution might be quite different, two forecasts quoting the same symmetric functional 
are identical regardless of the spread of an underlying symmetric probability distribution. 
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In Section 5.1, we illustrate the proposed procedure with the Federal Reserve’s GDP 


growth rates forecast, which only has n = 1 forecaster and a closed-form solution to the 
optimization problem in ([^ for the class V of Gaussian distributions. 

5 Examples 

In this section, we present applications of our methodology to real-world and simulated data 
sets. We compare the new approach to the related method of Patton and Timmermann 
(2007b), who consider spline loss functions with multiple nodes. This bears the risk of 


overhtting, as we will illustrate in simulated data generated the same way as in Patton and 


Timmermann (2007b). But hrst, we illustrate additional insights on the GDP Greenbook 


forecasts of the Federal Reserve. 


5.1 GDP Greenbook forecasts 

For the quarterly real GDP growth forecasts of the Federal Reserve over the period 1968 to 
1999 (T = 125 observations), standard tests of optimality based on the mean functional reject 


the rationality of the forecast. Patton and Timmermann (2007b) model the loss function as a 


quadratic spline with three nodes whose asymmetry is allowed to change with respect to the 
current GDP value. While the estimated loss function explains the forecasting behavior very 
well and the respective tests accept rationality of the forecast, the loss function is difficult 
to interpret and the estimator is only uniquely identified under additional assumptions on 
the conditional distributions. This leads to unreliable tests as we will show in Section 15.21 
Here, we interpret the forecasts as a state-dependent quantile of the Federal Reserve’s 
probability prediction. To investigate whether the quoted quantile changes with the GDP 
growth rate ?/, we set m{z^ 9) = m(?/, 9) = T( 6 'i-)-?/■ 6 ^ 2 ) as described in Section [T2 


As instru¬ 


mental variables w for the GMM estimator we choose Wt = (l, Xt,yt-i — Xt-i, V{xt-i, yt-i))'■ 


Performing a test of overidentifying restrictions (see Section 4.1), we obtain a X 2 


statistic of 0.80 corresponding to a p-value of 0.67. Consequently, there is no reason to reject 
rationality if we allow for state-dependent quantile forecasts. 

Compared to the spline loss function, we require only two instead of six parameters, and 


our more powerful test (see Section 5.2) does not reject the hypothesis of a rational forecast. 
The necessity of additional instruments for more parameters in the approach by [Patton and 
Timmermann (2007b) is e^ecially troublesome for such limited sample sizes. 

As illustrated in Figurethe forecasts can now be interpreted as state-dependent m{y, 9)- 
quantiles that depend on the current growth rate y. The forecasts are more conservative 
during recessions. We can quantify the amount of asymmetry in an interpretable way. The 
Federal Reserve reports lower quantile levels during times of low growth. This holds without 
further information about the predictive distribution. 

Importantly, it is also possible to compute the asymptotic distribution of the estimated 


m(|/, 6 *)-quantile as discussed in Section 3.3 The estimator is asymptotically unbiased and 
the asymptotic variance is given by 


E/T = 


_ / <^1,1 <^1,2 \ _ 


O'!,2 (^ 2,2 


0.30 

-0.07 


-0.07 

0.02 


( 6 ) 
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Figure 1: For the GDP data, the estimated state-dependent m(|/)-quantile plotted against 
the growth rate y, together with asymptotic pointwise conhdence intervals for conhdence 
levels 0.4 and 0.8 derived from the asymptotic distribntion of the estimate. 
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Figure 2: The predicted quantiles in the period from 1990Q4 to 2007Q4. The model was 
estimated with data from 1969Q4 up to 1999Q4. Starting in 2000Q1, the qnantiles can be 
seen as ont-of-sample predictions of the connection between the GDP growth forecasts and 
the true conditional distribntion. 
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Table 1: Mean-squared error (MSE) and mean flexible (i.e., state-dependent) lin-lin loss 
(MFLL) for the Federal Reserve’s GDP forecast Xt and the improved forecast fi for the 
out-of-sample period 2000 - 2007. 



MSE 

MFLL 


0.95 

1 

Xt 

1 

0.95 


And so, asymptotically, we have 


'h (m(?/, 6)) = Oi + y ■ 02 A/'( 6 'o,i + y ■ 60,2, ^1,1 + y ^2,2 + 

As T(-) is strictly monotone, we can calculate pointwise confidence intervals for Oi+y-92 for 
varying values of y and transform them into confidence intervals for m{y, 9) = '^{9i+y ■ 6 * 2 ). 
Figure shows such confidence intervals. 

Next, we implement the Wald test introduced in Section 4^ Here the hypothesis that 
the forecaster’s behavior does not change with respect to the current growth rate y, 

Ho : 00 e A with A := {0 G | 02 = 0 } 


is clearly rejected (p-value < 0.01). Thus, for the forecast to be rational, the underlying 
preferences need not only be asymmetric but also flexible with respect to a state variable. 

Figure illustrates the asymmetry in the forecast and its variation over time. The 
estimated quantiles differ significantly from the median. The realized values after the year 
2000 can be used to compute the issued quantile and analyze the predictive power of the 
model for future forecasts. 

To this end, we estimate the predictive density as described in Section 4A For simplicity, 
we take V to be the class of normal distributions with variance af = {yt-i — yt-2Y- In this 
case, the optimization problem in (|^ reduces to 


fit = - Xt\\, 


which is solved by fit = Xt — I)) where gm(A/'( 0 , 1 )) is the m-quantile of the 

standard normal distribution. That is, the estimated predictive density at time t is given 
by Af{fit,crt), and the appropriate point forecast for a given loss function is given by the 
corresponding functional. The optimal point forecast under the squared error loss, L{x, y) = 
{x — yY, is the mean fit of the predictive distribution. The original forecasts by the Federal 
Reserve were estimated to be the m(0, pt_i)-quantiles of these forecasts distributions, which 
means that they are optimal under the state-dependent lin-lin loss, (m( 0 , yt-i) — l(et < 0))et, 
where et ■= Xt — yt- 

As shown in Table [T| in the out-of-sample time period 2000 - 2007 the Federal Reserve’s 
forecasts Xt indeed have lower state-dependent lin-lin loss, while the means of the predictive 
densities fit are outperforming the original forecast in terms of the mean-squared error. 
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5.2 Simulation study 


In this section, we illustrate that our approach, while flexible, still produces well calibrated 
and powerful rationality tests. For ease of comparison to the approach of [Patton and Tim- 


mermann 


(2007b), we reproduce their simulation study. Each dataset yi,... ,yT is generated 


from an AR(l)-GARCH(l,l)-model of the form 


Yt = .5Yt_i + atSt for f = 1,2,... ,T, 

CTj = .1 + ■8a^_i + 

A7(0,l). 

We generate 3,000 datasets from this model for each of six sample sizes T G {50,100, 250, 
500,1000,2000}. For all data, we calculate the optimal predictions corresponding to a 
quadratic loss function of the form 


La{,x,y) 


a{x-yf, 

{x-y)\ 


The optimal prediction is 


X > y 
X < y. 


= arg min Ep[L„(a;, 17 + 1 )] = ei/i+a(P), 

X 


where ei/i+a(P) denotes the expectile of P at level We set a = 1.85, and obtain for a 
Gaussian distribution P the asymmetric forecast 


arg min Ep[Li. 85 (a;, 17 + 1 )] = ei/2.85(P) 

X 

= /ip + (Tp ■ ei/2,85(-^(0, 1)) 

~ /ip — CTp ■ 0.25, 


where /ip and dp denote the conditional expectation and standard deviation of P. We use 
the method described in Jones (1994) to compute the expectile. Let Xj be the filtration 
generated by the time series Y : 


Xi = cr(17,yt_i,...). 

A fully informed forecast with the information set 

^t = 1-t-l = 17-25 • • • )5 

at time t — 1 would issue the rational forecast 

x; = .517-1 - .25ai. 


Applying the standard GMM two-step estimator described in Section 3.1, we compute 
the overidentifying restriction tests of forecast rationality from Section [4.1 at significance 
level 5%. We compare the tests based on state-dependent quantiles and expectiles to the 
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Figure 3: Size of competing rationality tests for the rational full information forecast. The 
black line corresponds to the chosen signihcance level of 0.05. 


flexible spline test introduced in Patton and Timmermann (2007b j^, where all models allow 
flexibility with respect to the current outcome variable yt- As instruments Wt we chose a 
constant, the value of the forecast the forecast error e^, the squared forecast error e; 


11 


and one additional lag of these variables. 

In Figure we see that the quantile- and expectile-based rationality tests are better 
calibrated in terms of size than the spline test. The identihed functionals provide a solution 
to the problem of the state-dependent spline test, which ’’appears to require large samples 
(T > 1000) before the test’s size is close to its nominal value, and thus rejections obtained 
using this test must be interpreted with caution”(Patton and Timmermann, 2007b). In 


contrast to the spline-based estimation, the state-dependent quantile and expectile models 
may provide insightful point estimates and conhdence intervals even for moderate sample 
sizes. 

For the power analysis, we construct a forecast which is not rational with respect to 
the full information set. A forecaster exposed to information rigidities (see Coibion and 


Gorodnichenko, 2015) preventing the observation of the present values of the time series. 


•Ft — Xt_2 — Cr(17_2, F)-35 • • • 


issues rational forecasts with 


^^Yt\Tt — hVtlXt.a — •2517_2 

and 

^Yt\Tt = ^Yt\Xt-2 = l-lbcTj.i + .1. 

®One modification was implemented: the nodes of the the splines are located at the conditional 
sample means {E[et|et < 0], 0, E[edet > 0]} instead of the conditional sample medians {med[edet < 
0], 0, med[et|et > 0]}. When using the medians, the nodes of the spline are so close to zero that the resulting 
loss function is almost identical to the quad-quad loss, concealing the difference to the expectile-based test. 
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Figure 4: Size and power of competing rationality tests for the forecast with information 
rigidity. The solid lines represent the size for tests with lagged instruments (0.05 is optimal). 
The dashed lines represent the power for tests with non-lagged instruments (larger is better). 


The derivation of the rational forecast under such an information rigidity can be found in 
Appendix]^ This produces a rational forecast with respect to the information set Xt_ 2 . The 
same forecast is not rational with respect to the information set Xt_i of variables actually 
observable when issuing the forecast for Yt, which allows us to evaluate the power of the 
rationality test against information rigidities. A well performing test accepts rationality for 
lagged instruments based on X) = Xt -2 according to the 5%-level of the test, and rejects 
rationality for non-lagged instruments based on Xj_i. The lagged instruments are simply 
Wt-i- The results of this Monte Carlo experiment are presented in Figure]^ Our quantile- 
based rationality test shows the best size-calibration and is most powerful. The spline-based 
test is strongly oversized for small sample sizes and often unable to detect the information 
rigidity even for large sample sizes. 

While the asymptotic properties of the test based on quantiles and expectiles are well 
understood, we cannot expect tests of overidentifying restrictions based on the non-identified 
spline estimator to yield well calibrated tests (see, e.g.. Stock and Wright, 2000). 

Hence, our tests are not only better calibrated, but also more powerful. The advantage 
of the functional-based tests is even greater for more flexible models with more parame¬ 
ters, where the spline based test would require a large number of instruments to obtain an 
overidentihed estimator^ 

®For the simple case of a constant functional, the spline-based test is still oversized, but to a much lower 
degree (based on additional simulations not shown here). In general, the advantage of the functional-based 
tests seems to grow with the flexibility in the underlying preferences. 


Stock and Wright, 2000 
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6 Discussion 


For point forecasts with unknown directive, it is preferable to estimate and test the functional 
quoted by the forecaster rather than the loss function. While the specihc loss function cannot 
be identihed without further information, we have shown that the partial derivative of any 
consistent loss function may be used to estimate the functional. 

We dehned the classes of state-dependent quantiles and expectiles and showed that un¬ 
der the assumption of rational forecasts the functional can be consistently estimated. The 
asymptotic distribution of the GMM estimator and the overidentihed test statistic can be 
used to construct flexible tests of forecast rationality and of specihc model properties. In a 
simulation study, we illustrated that an existing spline-based test is over-sized and unlikely 
to detect information rigidities, while the new estimators yield well calibrated and powerful 
tests. 

We further showed that the GDP forecasts of the Federal Reserve can be rationalized as 
state-dependent quantiles changing with the current growth rate, and we constructed a test 
which indicated that the asymmetry depends on the growth rate. 

Some functionals, such as the mode and the expected shortfall, cannot be expressed 
as risk-minimizing rules with respect to any loss function for broad classes of probability 


distributions (Gneiting, 2011 Heinrich, 2014). However, Fissler and Ziegel (2016) showed 


that expected shortfall is jointly elicitable with value-at-risk, and the results presented here 
can be extended to such multivariate functionals. More generally, the specihcation tests 
introduced in Section [4^ provide a new tool to generate backtests for value-at-risk forecasts 
or any other elicitable risk measure. Such backtests are powerful against misspecihed risk 
models with respect to observable state-variables and directly yield the nature of underlying 
errors. 

An important application of the new approach is the comparison of a point forecast with 
unknown directive to other point or probability forecasts. In this situation, the functional 
represented by the point forecast needs to be extracted from the probability forecast, because 
the results of such a comparison might differ substantively depending on the applied func¬ 
tional. We propose to estimate the functional represented by the point forecasts as described 
in Section and then to extract the estimated functional from the probability predictions. 
The resulting two sets of point forecasts can be compared with any consistent loss function 


for the functional (see 

Giacomini and White 

2006 

)• 

Lieli and Stinchcombe 

(2013 

) discuss the recoverability of the loss function if the (possibly 


misspecihed) predictive densities are observable. In recent work, Patton (2014) showed that 
under misspecihcation the ranking of competing forecasts is not robust to different choices of 
consistent loss functions and therefore the forecaster should rather specify the loss function 
and not merely the functional. But, as we point out, only the functional can be identihed 
by the time series of realizations and forecasts alone, and therefore the estimation of a single 
specihc loss function is impossible, unless additional information is available. |Ehm et ah 


(2016) provide an one-dimensional class of extremal scores for quantiles and expectiles, which 
allow to construct every other consistent loss function by a convex combination. Hence, even 
misspecihed quantile or expectile forecasts can be compared without explicit knowledge of 
the loss function, as long as the class of extremal scores is taken into account. 
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A Proof of Lemma [U 


Regularity Conditions 1. 


1. a : P I—)• M is elicitable with respect to the class V of absolutely continuous probability distri¬ 
butions. 

2. C{Y\F) e V. 

3. a : P I—>■ M is a continuous functional. 

4 . a : P I—)• M is locally nonconstant. 


Terminology and definitions follow Steinwart et al. (2014). Assumption 1 is limiting the class of 
functionals to those that can be defined via loss functions on the set P of possible distributions. For 
expectiles and quantiles, this includes all distributions with finite first moment (Gneiting, 2011). 

Assumption 2 refers to the forecasting behavior. The functional is applied to the conditional 
distribution of Y, which is an P-measurable variable. We make no assumptions here about the 
information set J-. The arising conditional distributions need to be in P. In particular, we assume 
that the conditional distribution has no singular component. Note, however, that this assumption 
could be relaxed, if we instead use the canonical extension L(a,)(x,y) = 0 on the set, where the 
partial derivative = 0 does not exist. In this case, the partial derivative Lf^^-^{x,y) = 0 of 

any consistent loss function does not almost surely coincide with the identification function. 

For example, in the case of the mean functional we choose P to be set of continuous distributions 
with finite first moment. With respect to these distributions the mean is elicitable, nonconstant 
and continuous. 


Note that the assumptions above can be generalized to include set-valued functionals (Gneiting 


2011 ). 


Proof of Lemma [T1 By assumption there exists an identification function V, and for every locally 
Lipschitz-continuous consistent loss function L, it holds that L(^x'j(t,y) exists for A x p-almost all 
{t,y), where A is the Lebesgue measure on the image of a and the distributions in P are absolutely 
continuous with respect to fi, and 


L(x){t,y) = w{t)V{t,y), 


for some bounded w : 




(see Steinwart et ah, 2014). 


By definition of the identification function it holds for all P G P that 

Ep[V{t,Y)] = 0 ^ t = a(P). 
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In particular, it follows that ¥,\y{X,Y)\F] = 0, as C{Y\F) G V and X = a{Y\F). As C{Y\F) is 
absolutely continuous and X constant given F, the identification function V{X,Y) exists almost 
surely. 

Any J^-measurable variable W remains constant under the integral. Integration reveals the 
unconditional moment condition 


E[V{X, Y)-W]=0 for all W £ F. 


□ 


B Proof of Theorem |T] 


We identify the function g{m{z, 0),x, y, w) with g(v, 9), where v = {x, y, z, w). For ease of notation, 
we introduce the function g(v, 6) = h{v, 9)w, where h{v, 9) is the identification function of the 
respective functional. By Lemma[^ E[h(u, 0o)|-^] = 0. 

Recall that S := E[g{V,9Q)g{V,9Q)'] and G := E[ 5 f( 5 ))(F, 0o)]j and let S be an estimator of S. 

Regularity Conditions 2. 

1. The stochastic process {14 11 G N} is ergodic (in means) and strictly stationar'^ 

2. C{Y\F) £ V, where V is the class of absolutely continuous distributions with strictly positive 
densities and finite first moments. 

3. The parameter space Q <£M.p is compact. 

4 . The model m{z, 9) is continuous on 0 for all z. 

5. The state variable z is F-measurable. 


6. E[||t(;i||] < 00 , E[||tt;tXt|| < 00 and E[||r(;t?/i||] < 00 for all t G N. 


7. S —> S, where S is positive definite. 

Assumption would also follow directly if we assume that yt is stationary and ergodic and 
the information sigma algebra is generated exclusively by a finite set of past realizations Ft = 
a{yt-i,... ,yt-t')- Then, x and w are ergodic and stationary, as measurable functions of a fi¬ 
nite number of ergodic and stationary variables. Assumption ensures that the state-dependent 
quantiles and expectiles fulfill Regularity Conditions [ij Expectiles do not require strictly positive 
densities, as they are only set-valued for the special case of a Dirac measure. The requirement of an 
absolutely continuous density could also be relaxed. In this case, the partial derivative L(a,)(x, y) of 
any consistent loss function does not exist almost surely, and additional assumptions would have to 
guarantee that the parameter is uniquely identified as multiple quantile-levels coincide. Assumption 
could be substituted by a concave target function (Newey and McFadden, 1994, Theorem 2.7). 
Assumption is easily verified. It limits, however, the scope of eligible models for the estimation. 
Note that the models need only be continuous in 9. They are not restricted with respect to the 
state variable z. The extension of Theorem to discountinous models is beyond the scope of this 
paper, but could be achieved through the concept of equicontinuity. See Newey and McFadden 


(1994) for theory and Giacomini and Komunjerl (2005) for an application to quantile forecasts. 


^Strict stationarity means that the distribution of {vt, r’t+i,..., Vt+s) does not depend on t for any s, and 


(mean) ergodicity implies that 7p J2t=i ®('^t) 


(Newey and McFadden, 19941. 


E[a(ut)] for measurable functions a(-) with E[|a(ut)|] < 00 
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Lemma 2 (Orientation). For state-dependent quantiles and expectiles, there exists an oriented 
identifieation funetion. That is, for a// P G "P it holds that 


Ep[’L(a:,y)] > 0 x > Q!(P). 

Proof of Lemma For every elicitable, continuous and quasi-monotonic functional, either V or 


—V are orientated (Steinwart et ah, 2014) 


We show that quantiles and expectiles fulfill Regularity Conditions!^ Quantiles and expectiles 
are elicitable for the class of distributions with finite hrst moments. They are continuous functionals. 
Given a strictly positive density, they are single-valued. Both are locally nonconstant as a shifted 
distribution is also contained in the class and the new functional is shifted by the same amount. 

It follows directly that the identihcation function for m{z, 0)-quantiles 

V{x, y) = {1 — m{z, 9))l{y < x) -|- m{z, 6)l{y > x) 
and for m{z, 0 )-expectiles 

y{x,y) = [(1 - m{z,e))l{y < x) + m{z,e)l{y > x)]{x - y) 


are orientated. □ 

Lemmais useful to show that m{z,0) > m{z,9o) implies E[/i(u,0)|P] > 0. 

Lemma 3 (Uniqueness ). Given Regularity Conditions^ it holds that 

E[h{v,9)\T] =0 ^ 0 = 00- 

Proof of Lemma [3l We first prove the lemma for the class of quantiles. With Lemma follows 
that h{v, 9o) fulfills the conditional moment condition 

E[h{v,9om = 0. 

As for any 0 G 0 with 0 7 ^ 0o it holds that 

E[h{v,9)\J='] =E[m{z,9) - l{x > Y)\T] 

= E[m{z,9)\T]-E[l{x > Y)\F] 

= m{z, 0) - Ejr{aY\j^{m{z, 0o)) > Y) 

= m{z, 0 ) — m{z, 0 o) 7 ^ 0 , 

if 0 7 ^ 00 because we assumed that every competing model differs for some z, which is P-measurable. 
For the expectile-based model the moment condition 


E[h{V,9)\F] = 0 


is equivalent to 


E[\l{x > Y) - m{z,9)\{x - Y)\F] =0. 

As z is P-measurable, the absolute value can be deducted easily and we obtain 


E[l(x >Y){x-Y)- m{z, 9){x-Y){l-2 - l{x < y))|P] = 0. 
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We write the expectation in the form of an integral with ¥ = Y\T and separate at the point Y = x, 
to obtain 

/ X / f'X /*CXD \ 

(x - y)dF{y) - m{z, 6)1 {x - y)dF{y) - (x - y)dF{y) j = 0, 

-oo \J—oo Jx / 

which is equivalent to 

/ X roo 

{x — y)dF{y) = m{z,9) / {y — x)dF{y). 

-OO Jx 

The last equation is the definition of the m( 2 ;, 0)-expectile for the distribution P = Y\T. Conse¬ 
quently, the conditional moment condition holds if and only if x = TY\r{'m{z^ 9)). If 0 / 9q, it follows 
that m{z, 9) ^ m{z, 9o) for some z, which in turn implies that x = TY\j^{'m{z, 9o)) / TY\j^{m{z, 9)) 
cannot satisfy K[h{V, 9)\F] = 0 for those z. 

□ 


Proof of Theorem [T]. We verify that there exist instruments w such that the conditions of The¬ 
orem 2.6 of Newey and McFadden (1994) (pp. 2132 - 2133) are satisfied, which directly implies 
consistency. 

As S is positive dehnite, its inverse S~^ exists and is also positive definite. It follows directly 
that, 


5-^Eb(u,0)] = O 


E[<7(u,0)]=O. 


From Lemma it follows that f{9) := 'E[h{v,9)\F] is the constant zero function if and only if 
9 = 9o. 

Let 9 ^ 9q and A := l{m{z,9o) > m{z,9)). For a fixed 9, it holds that A £ T, because z is 
J^-measurable. The same holds true for its complement A^, as is a cr-algebra. By assumption, 
alternative models differ from the true model on a set with positive weight: 


F{m{Z, 9o) 7 ^ m(Z, 9)) > 0 for all 0 G 0 with 9 ^ 9o 

It follows that either A or have positive weight. We define w := (1(A), — 1(A'^)). With Lemma 
it follows that f{9) > 0 <^=i> 1(A) = 1. Now, it holds that w £ F and 

F[9{V,9)]=F[F[h{v,9)w\F]] 

= E[/(0)u;]/O 

as f{9)w > 0 by definition. Consequently, there exits an J^-measurable instrumental vector w such 
that the unique identification property i) holds. 

As the parameter space 0 is compact, and the applied functional is in our parameterized class, 
it follows that 9o £ & and ii) is satisfied. 

The continuity of g{v,9) in 9 (assumption hi)) follows directly from the continuous model 
m{z,9): for quantiles, we can easily compute that g{v,9) = (l(x > y) — m{z,9))w a.s., hence g 
is continuous in the parameter 9 a.s., as a composition of continuous functions. For expectiles, it 
holds that 


g{v, 9) = |l(x > y) — m{z, 0)|(x — y)w 

= [l(x >y)- m{z, 9){l-2 - l(x < y))](x - y)w, 

which again is a continuous function in 9. 
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Finally, supgge || 5 (Vt, 0)11 < ||t/;|| for quantiles and sup 5 )g 0 ||( 7 (V), 0)|| < ||t(;|| ||(x — y)|| < ||tea;|| + 
\\wy\\ for expectiles, which implies E[sup 5 ig 0 || 5 (Vt, 0 )||] < oo under the regularity conditions. Thus, 
iv) of Theorem 2.6 in Newey and McFadden ( 1994[ ) is satisfied, and the GMM estimator is consistent: 

9 t 4 00 - 

□ 


C Proof of Theorem [2l 


Regularity Conditions 3. 

1 . 00 G 0. 

2 . almost surely exists and is continuous in a neighborhood of Oq for each z. 

3. m(^ 0 '^{z, 6 ) is almost surely uniformly bounded. 

4- E[||r(;^||] < oo, E[||xrt;^||] < oo and E[||?/rt;^||] < oo. 

5. G'SG is nonsingular. 


Proof of Theorem [2l We verify that the conditions of Theorem 3.4 of [Newey and McFadden] 
pTMl p. 2418) are satished. 

Consistency of the estimator is established with Regularity Conditions and 0o is an interior 
solution in 0 by assumption. 

For quantiles, the partial derivative of the target function g is 


9e{v,0) = -m(e)(u,0)n;, 


and for expectiles 

geiv,0) = -m(0)(n,0)(l - 2 • l(x > y)){x - y)w = me{v,e)\x - y\w. 

Both derivatives exist almost surely and are continuous in a neighborhood of 0 o if the same condition 
holds for 772 , 0 (n, 0). 

As 115(2;, 0)^11 < lltc^ll for quantiles and ||5(2;,0)^|| < ||(x —y)22;^|| < ||x2(;^|| +1|2/22;^|| for expectiles, 
it follows from the regularity conditions that E[||5(2;, 0o)|p] is finite. 

Let K he a. bound for m(^0'^{z,6). As ||5(0)(2;, 0 )|| < A'l^H for quantiles and ||5(0)(2;, 0 )|| < 
||A'(x — y)w\\ < A(||x22;|| + ||y22;||) for expectiles, it foll ows that E[supgg0 ||g(g) a(2;, 0)||] < 00. 

With the nonsingularity of G'SG, Theorem 3.4 of Newey and McFadden ( | 1994 ) applies and it 
follows that 

%/r(0V-0o) Aaa(o,s), 


where S = {G'SG)G'SS-^SGiG'SG)-^ = (G'S-^G)-^ 


□ 


D Proof of Theorem 


Proof of Theorem [Sl This is the well known J-test of overidentifying restrictions ( |Hansen[|1982| ). 
For completeness, the proof is provided here: 
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With Slutzky’s Theorem it follows that 

J{9t) ^ Jq{9t) '■= TgriOr)'S~^gT{0T)- 

Using the mean value theorem, the asymptotic distribution of and the fact that the 

asymptotic variance of gri^o) is 5, it holds true that 

S-^/^VrgTiOT) ^ AA(0, N) as T —)• oo, 

with N = Iq — P, and P = S~^G)~^G 'is idempotent of rank q — p, because the 

rank of S~^^‘^G is p, as the parameter is identified. We choose e > 0 sufficiently small such that g 
equals its first-order Taylor expansion: 


giz, 0) = g{z, Oo) + [dg{z, 9o)/de] {9 - 9o) 

=> Eb(Z, 9)] = E[9(7(Z, 90)189] {9 - 9o) 

As 9 has p dimensions and "&\g{Z,9)] ^ 0, the rank of the matrix ¥\dg{Z,9o)/89] has to be p. It 
follows that 


Jo{9t) a n'q [I, - P{9o)] n, 




where Uq denotes a q dimensional random vector with standard normal distribution. 

As Jt has the same asymptotic distribution as Jq, we just have to show that n'q [Ig — ^*(00)] 
has a Xg-p distribution. See, for example, Rao (1973) for the proof that n'qAuq has a chi-square 
distribution with k degrees of freedom if A is idempotent with rank(A) = k. The projection matrix 
Iq — P{9o) obviously is idempotent and consequently it holds that 


Jo,T 


■^q-p- 


□ 


E Example of asymmetric information 

Consider the data-generating process 


Yt = Zl + Zf + eu 

where the value of the random variable zf is known exclusively to the forecaster. The value of 
the random variable is known exclusively to the forecast user and is distributed according to the 
distribution G with a mean of zero. The innovation et is unknown to both agents and is distributed 
according to the distribution F with a mean of zero. All three variables are independent. The 
forecaster issues a mean forecast 

Xt = nYt\z(] = z(. 

However, given the information set of the forecast user, which is generated by and the 
point forecast can be interpreted as a state-dependent quantile. 


E(yi < Xt]ZTt, Xt) = F{zi + Z^ + et< Xt]Z^, Xt) 
= ¥{zi + Zr + U< a/|Z“,W) 
= E(et<-Zi“|Zr) 

= 
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Given the forecast user’s information set, Xt becomes the quantile at level r = F{—Zf). There¬ 
fore, under asymmetric information, even a standard mean forecast may become an asymmetric 
and state-dependent quantile. 


F Derivation of rational point forecasts under informa¬ 
tion rigidities 


Given the data-generating process in Section 5.2 it holds that 

Yt = \Yt-i + atet 

= \ {\Yt-2 + at-iet-i) + atet 
= \Yt-2 + \at-iet-i + atet- 

Crucially, for all t the variables Yt, e* and at+i are Xt-measurable. Thus, 

¥.t-2[Yt] = \Yt-2 + 0 + Ei_2[utEi_i[et]] = \Yt-2 
as Zt -2 C Zt-i and Et_i[et] = 0, and 

Vari_2[Ti] = Xait-2[\Yt-2 + \at-iet-i + atet] = \m:t-2[\at-iet-i + atet]. 


It is easily seen that 

'Et-2[\at-iet-i +atet] = \at-i^t-2[et-i] + Et-2[atEt-i[et]] = 0 . 


The second moment is 


^t-2[{\at-iet-i -h atetf] = Et-2[lat_iet_i -h at-iet-iatet + cr^ ] 

= i(T4_iEi_2[ei_i] + Et-2[at-iatet-iEt-i[et]] + Et-2[atEt-i[e‘f]] 

= + '^t-2\al], 


where it holds that 

Ef-2[<7t] = Et_2[.l -I- ■'&al_i + .lat-ie^_i] 

= .1 -|- .8cjj_i -|- .lal_iEt-2[e^-^ 

= .l + .9a2_i. 
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